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Abstract 

In this work we present 3 case studies of local temperature time series obtained 
from stations in Europe and Israel. The nonlinear nature of the series is pre- 
sented along with model based forecasting. Data is nonlinearly filtered using 
high dimensional projection and analysis is performed on the filtered data. A 
lorenz type model of 3 first order ODEs is then fitted. Forecasts are shown for 
periods of 100 days ahead, outperforming any existing forecast method known 
today. While other models fail at forecasting periods above 11 days, ours shows 
remarkable stability 100 days ahead. 

Thus finally a local dynamical system if found for local temperature fore- 
casting not requiring solution of Navier-Stokes equations. Thus saving compu- 
tational costs. 



1. Introduction 

Weather forecasting is crucial for government, military, industrial and agri- 
cultural applications. As well as an open scientific question. 
Global numerical weather forecasting pioneered by (l2j today is carried out 
using a multitude of methods, each having it's advantages and disadvantages. 
The foremost method consists of the solution of the global flow and energy 
equations of the atmosphere (Navier-Stokes). Due to the inherent complexity of 
these equations, simplifications are used. These include hydrostatic, gcostrophic 
and quasi-gcostrophic approximations. The domain is then divided using a grid 
and equations are solved within the grid and advanced in time, using finite el- 
ements or finite difference methods. Newer methods includ e sp ectral methods, 
which show exponential convergence for smooth problems |15| . 

Manipulating the vast datasets and performing the complex calculations 
necessary to modern numerical weather prediction requires some of the most 
powerful supercomputers in the world. Even with the increasing power of su- 
percomputers, the forecast skill of numerical weather models extends to about 
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only six days. Factors affecting the accuracy of numerical predictions include 
the density and quality of observations used as input to the forecasts, along with 
deficiencies in the numerical models themselves. Although post-processing tech- 
niques such as model output statistics (MOS) have been developed to improve 
the handling of errors in numerical predictions, a more fundamental problem 
lies in the chaotic nature of the partial differential equations used to simulate 
the atmosphere. It is impossible to solve these equations exactly, and small 
errors grow with time (doubling about every five days). In addition, the partial 
differential equations used in the model need to be supplemented with parame- 
terizations for solar radiation, moist processes (clouds and precipitation), heat 
exchange, soil, vegetation, surface water, and the effects of terrain. In an effort 
to quantify the large amount of inherent uncertainty remaining in numerical 
predictions, ensemble forecasts @ have been used since the 1990s to help gauge 
the confidence in the forecast, and to obtain useful results farther into the future 
than otherwise possible. This approach analyzes multiple forecasts created with 
an individual forecast model or multiple models. Monte Carlo simulations are 
then carried out on the forecasts to obtain the statistical distribution of error 
of the forecast. The limitations of ensemble methods are inherent in the global 
modeling approach. 

Other methods for weather forecasting include the "downscaling" of global 
climate models (GCMs) [HI to localized regions using statistical relations be- 
tween predictors and predictands. Predictors are taken from the GCM model 
output such as temperature, pressure at different altitudes and locations. Pre- 
dictands are the local variables to be simulated, these include temperature, 
pressure, wind and precipitation. These methods are mainly used for the study 
of impact of global warming on surface variables. [Io| have shown that these 
models deviate significantly from observed data. Statistical methods of weather 
forecasting from time series are known from the works of and 0. In these 
methods the series is decomposed into cyclical, trend and error terms. The cycli- 
cal component is approximated by a Fourier decomposition, trend with linear 
term and error is approximated using GARCH model. These models however 
don't teach us anything about the dynamics of the system. Further more Garch 
models explode at prediction ranges larger than 11 days. 

In this work we present for the first time the existence of a deterministic ODE 
system describing the evolution of temperature from daily mean temperature 
time series data. Section 2 describes the embeding procedure, section 3 describes 
the model fitting and shows results for forecasting 100 days forward. 

2. Mathematical model 

2.1. Nonlinear filtration of time series 

We assume that the time series are generated by a dynamical system of the 
form: 

^=/i({*i}£i)+&, i = W, ->™ (!) 
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Where £j is the inherent noise of the system which is assumed to be i.i.d. (in- 
dependent, identically distributed). 

And the observable has measurement noise: 

V(t)=v{{xj}? =1 )+t (2) 

In order to obtain the dynamical system, data has to be filtered first. We 
use high-dimensional projection and singular value decomposition (SVD) [l4[ 
to obtain the attractor from the principal directions. I.e. noise is projected 
to high dimensional manifold (dimension 12). Singular value decomposition is 
carried out on the projection to obtain principal directions of attractor. 

This is then projected back to time series as clean signal, filter takes about 
250 iterations to converge to clean signal. 

Numerical analysis was carried out using the TISE AN package Q. TISEAN 
is a nonlinear time series analysis package developed on the principles of (l3j ]. 



11( and |8j. 

Wc sampled mean daily temperature from a few stations. Data was obtained 



from European weather center |http: / /www. knmi.nl Each time series consists 



of 15000 terms. Figs. [TJ [2] and [3] show an excellent fitting of the filtered 
dynamical system to raw data. The first one depicting the original data vs. 
filtered on 2000 points for Berlin, Paris and Tel- Aviv respectively. The second 
figure shows the first 600 points of fig. [TJ remarkable fit is obtained. The third, 
fig. [3] shows the distribution of residuals: 

r(t)=y f (t)-y(t) (3) 

Variances being 10.97, 10.97, 3.28 for Berlin, Paris and Tel-Aviv respectively. 
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Figure 1: Filtered(bluc) vs. raw data(red) of 3 stations, Berlin, Paris, Tel-Aviv, 2000 points 




Figure 2: Filtcrcd(bluc) vs. raw data(red) of 3 stations, Berlin, Paris, Tel-Aviv, zoomed to 
600 points 
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Figure 3: Histograms of residuals of Berlin, Paris, Tel- Aviv, variances are shown in the figure 
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2.2. Embedding results 

We are given the filtered time series x(ti) from previous section which is 
assumed to be generated by ([J), possesing an attractor A with box dimension 
d A . 

Whitney's embedding theorem [l7| states that any smooth n dimensional man- 
ifold can be embedded in 2n + I Euclidean space. I.e. no two points on the 
manifold map to the same point in R 2n+1 

Takens theorem [l6| extends Whitney's theorem such that an attactor A with 
box dimension d A can be embedded in k dimensional Euclidean space provided 
k > 2d A . 

I.e. there exists a diffeomorphism <fi from A into R k . The delay theorem states 
that using the delay vector: 

(x(t), x(t - r), x(t - 2t), x(t - (k - l)r)) 

where r is the delay, reconstructs the system in R k if the dimension of system 
2d A < k. Hence 2 parameters govern the reconstruction of attractor, namely 
r, m the delay and dimension of space. We use autocorrelation function R x (t) 
to compute the delay which matches the decorrelation time 13|, i.e. when 
R x {r) < \- 

Rx{j) = E [(x(t) - y)(x(t + t) ~ v)} (4) 
a 2 

It is evident that — 1 < R x {t) < 1. R x can also be defined using the convolution 
theorem: 

1 r°° 

Rx{r) - t- / S{u)e-^ T du (5) 

Where: 

S(u) = F{x)*F 

/OO 
x(t)e lult dt 

The decorrelation time r has physical significance, it is defined as the time at 
which the phase of the wave has wandered significantly and the wave is no longer 
coherent. Formally it is defined as the inverse of the spectral width of the signal. 

f °°S*(v)du W 

For deterministic monochromatic waves the decorrelation time is infinite. 
For white Grassbergernoisc: 

Mr)=m (8) 
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Decorrelation is immediate. 

We use false nearest neighbor analyis @ to determine the proper embed- 
ding dimension. The idea is quite intuitive. Suppose the minimal embedding 
dimension for a given time series i s tuq. This means that in a mo-dimensional 
delay space the reconstructed attractor is a one-to-one image of the attractor 
in the original phase space. Especially, the topological properties are preserved. 
Thus the neighbors of a given point are mapped onto neighbors in the delay 
space. Due to the assumed smoothness of the dynamics, neighborhoods of the 
points are mapped onto neighborhoods again. Of course the shape and the di- 
ameter of the neighborhoods is changed according to the Lyapunov exponents. 
But suppose now you embed in an m-dimensional space with m < mo. Due 
to this projection the topological structure is no longer preserved. Points are 
projected into neighborhoods of other points to which they wouldn't belong in 
higher dimensions. These points are called false neighbors. If now the dynam- 
ics is applied, these false neighbors are not typically mapped into the image of 
the neighborhood, but somewhere else, so that the average "diameter" becomes 
quite large. For each point in the series we compute the nearest points and pick 
the one with the minimal Euclidean distance and compute the distance in the 
next iteration. 

_ ||Xj + i -Xj + i|| 

If Ri exceeds a certain heuristic then Xj IS cL false neighbor. 

When the fraction of false nearest neighbors shows first local minimum the 
dimension m is determined. 

The correlation dimension is computed using [ill ]. The correlation integral: 

C(r) = J d/x(x) J 0(r - ||x - y|)d/x(y) (10) 

Where /x is the probability measure of the set of points and is the Heaviside 
step function. Basically C(r) counts points of distance less than r. It is assumed 
in [TO] that: 

C(r) ~ r D (11) 
Thus the correlation dimension is defined as: 

B=V,,n<-^ (12, 

r->o log(r) 

Fig. U shows the autocorrelation function R x (t) as function of r. Decorre- 
lation delay is r = 65. Further more, local maxima at r = 365n where n is an 
integer, indicate annual periodicity of time series. It is notable that all stations 
display same characteristics. 
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Figure 5: False nearest neighbors fraction, Berlin, Paris and Tel-Aviv stations 



We also performed false nearest neighbor analysis on the readings. Since the 
data is assumed noisy we look at the first minimum. Fig. [3] show the dimension 
in all stations to be m = 3. 

Next we performed the actual delay map and computed the positive Lya- 



punov exponents and Grassbcrgcr-Procaccia [llj correlation dimenions which 
are the topological invariants of attractors. 

Leading Lyapunov exponents: Table !2.2l shows the leading Lyapunov expo- 
nent for the stations. Figs. 0] to [7] and tables 12.21 and 12.21 show that the 



Table 1: Positive Lyapunov exponents for Tel- Aviv, Paris and Berlin 



Berlin 


Paris 


Tel-Aviv 


0.246 


0.250 


0.062 


0.242 


0.243 


0.060 



Table 2: Grassberger-Procaccia correlations for Tel-Aviv, Paris and Berlin 



Tel-Aviv 


Paris 


Berlin 


1.6 


1.7 


1.3 



same dynamical system controls the local temperature at 3 different stations 
on earth. We have also computed the autocorrelation function of the residual 
noise r(t), equation ([3]). Fig. [8] shows that the autocorrelation function for all 
3 stations is that of white noise. This establishes our claim at equation ([!}. 
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Figure 6: Reconstructed attractor, Berlin, Paris and Tel-Aviv stations 
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Figure 7: Correlation sums for Berlin, Paris and Tel-Aviv stations 
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Figure 8: Autocorrelation functions of r(t), Berlin, Paris and Tel-Aviv stations 



3. Model based prediction 

3.1. Fitting of set of first order ODEs 

We assume our reconstructed dynamical system obeys bilinear (generalized 
Lorenz) + trend + 365 days period forcing terms: 

^=f i ({x j } 3 j=1 ,a), . = 1,2,3 (13) 

, ,» 27r(£mod 365) . 
fi{{xj}j=i,a) = a l00 + ctijoXj + ctijkXjXk + Pi cos — (14) 

Where summation convention is applied. 

We are given the phase reconstructed data from previous section fig. [5] 
We need to match the a parameters tensor and the given data. Let us 

vectorize the tensor a. 

a Lk+3j+9i = a m ( 15 ) 

We use the method employed by [3] and define the cost function: 

n 3 

H({ Xi }U,a*) = (^(tO - f 3 {a*)f (16) 

i=i j=i 
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Gauss- Newton method [4[ is used to minimize the cost function H(x, y, z,a*). 
Assume an initial parameter vector and use the Taylor expansion of H (x, y, z, a* 
with respect to a*(i) is: 

H{{ Xi }U,u*) = H({ Xl }U,<) + E Q-^jj («*(?) - «S(J))+ 



1 /} 2 77 

2 E E ( 2 ^ ^) ^ (j )^( fc ) ^( fc ) - «5(*))(«*(?) - o8(j)) + 0(8a 



(17) 



Next, we solve m equations: 



2EE(4W -/*(«')) ^ = (18) 



The time derivatives are taken using difference scheme: 

/ / . n _ + 8x a (t j+ i) - 8x a (tj_i) + a; a (tj + i) 

Xa[ j) ~ 12 At 



(19) 



Performing the minimization procedure on learning set of 14000 days we obtain 
a solution for a . System is then integrated forward in time, system is stiff 
(positive Lyapunov exponents (table I2.20 ) and solution explodes after 150 days 
ahead. ODE fit is shown for Berlin, Paris and Tel- Aviv in figs. [9] and [TOl 100 
days ahead. 

Fig. |9] shows the fit of solution of odes to reconstructed filtered data and 
raw data. The trajectory fits well to the filtered data and passes in midline of 
oscillations of raw data. Fig. [TU] shows the relative squared error of fitted vs. 
filtered data for prediction 100 days ahead. 
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Figure 9: Measured (green) vs. filtered (red) and predicted solution of odes (blue) for Berlin, 
Paris and Tel- Aviv 



Berlin Paris x ^Jel-Aviv 




t (days) t (days) t (days) 



Figure 10: Relative squared error for fit vs. filtered data for Berlin, Paris and Tel-Aviv 
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4. Concluding remarks 

We have established in this work the existence of a deterministic dynamical 
system governing the evolution of local temperatures. This dynamical system 
yields a very good fit to the nonlincarly filtered data. The residual noise (differ- 
ence between dynamical system and raw data) is bounded. We have also shown 
using autocorrelation and distribution fittng that residual noise is Gaussian. 
Further work is required to improve fitting of dynamical system to the filtered 
data. Further work is also required to derive a complete stochastic differential 
equation with a mean reverting process. 
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